Skip to content

Conversation

@quaquel
Copy link
Member

@quaquel quaquel commented Nov 18, 2025

This PR addressed #2884.

It does several things

  1. Adds a decorator for annotating methods that currently take seed or random as a kwarg. The decorator results in a FutureWarning when seed or random is being used instead of the new preferred rng kwarg. I use. FutureWarning because this is intended for end users (See https://docs.python.org/3/library/warnings.html).
  2. Adds this decorator to all methods that have seed or random as a kwarg
  3. Starts adding rng to all decorated classes and update their inner workings to use numpy generators instead of stdlib random generators.

reminder
we need to fix the set_seed widget on the solara side as well!

@github-actions

This comment was marked as outdated.

@quaquel quaquel added enhancement Release notes label trigger-benchmarks Special label that triggers the benchmarking CI deprecation When a new deprecation is introduced labels Nov 18, 2025
@quaquel quaquel linked an issue Nov 18, 2025 that may be closed by this pull request
4 tasks
@quaquel quaquel removed the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@quaquel quaquel added the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@github-actions

This comment was marked as outdated.

@quaquel quaquel removed the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@quaquel quaquel added the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@github-actions

This comment was marked as outdated.

@quaquel quaquel added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Nov 18, 2025
@github-actions
Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔴 +15.2% [+13.8%, +16.6%] 🔴 +65.9% [+65.5%, +66.2%]
BoltzmannWealth large 🔴 +5.5% [+4.9%, +6.1%] 🔴 +56.0% [+54.0%, +58.1%]
Schelling small 🔴 +7.3% [+7.1%, +7.5%] 🔵 -1.7% [-1.8%, -1.5%]
Schelling large 🔵 +2.7% [+2.2%, +3.2%] 🔴 +78.0% [+76.6%, +79.2%]
WolfSheep small 🔴 +24.2% [+23.6%, +24.7%] 🔴 +35.6% [+28.2%, +43.3%]
WolfSheep large 🔴 +22.6% [+21.5%, +23.5%] 🔴 +31.1% [+29.4%, +32.5%]
BoidFlockers small 🔵 +0.4% [-0.4%, +1.2%] 🔵 +0.2% [-0.1%, +0.5%]
BoidFlockers large 🔵 +1.0% [+0.4%, +1.7%] 🔵 +0.8% [+0.6%, +1.0%]

@quaquel
Copy link
Member Author

quaquel commented Nov 19, 2025

As can be seen in the performance benchmarks, shifting from stdlib random to numpy.random.default_rng comes with a performance hit. I dug into this a bit more. The two main random calls that are made in these benchmark models are shuffle and choice. np.random.default_rng().shuffle() is about 20% faster than random.shuffle(). However, np.random.default_rng().choice is several orders of magnitude slower than random.choice(). In fact, for the specific case of choosing a single item randomly from a collection, it is advised to use collection[rng.integers(0, len(collection)] instead, but this is still twice as expensive as random.choice.

More broadly, numpy.random.default_rng() is designed for generating and operating on numpy arrays. So, calls to create a single random number (e.g., rng.random, rng.integers) are all much slower than their stdlib random alternatives. However, once the size becomes larger, they become increasingly faster than a list expression with stdlib random. So, shifting to rng comes with additional implementation considerations to work around this performance hit. See some of the updated examples for what that can look like.

Thoughts are welcome. In your opinion, is the improved quality of the random number generators and the fact that you have local state with numpy worth the performance hit?

@quaquel quaquel added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Nov 20, 2025
@quaquel
Copy link
Member Author

quaquel commented Nov 20, 2025

Given how common it is to draw a single random number (i.e., random.random(), I was wondering about adding a shorthand for this and see how fast this was.

def test_rand(rng, initial_size=100):
    initial_values = rng.random(size=initial_size)

    while True:
        for entry in initial_values:
            yield entry
        initial_value = rng.random(size=initial_size)

def draw():
    return next(a)

a = test_rand(rng, 200)  

And then I timed these as shown below

%%timeit
[random.random() for _ in range(250)]
8.88 μs ± 72.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
[draw() for _ in range(250)]
20.7 μs ± 130 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
[next(a) for _ in range(250)]
16.4 μs ± 121 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
[rng.random() for _ in range(250)]
67.9 μs ± 273 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

So, a naive shift from random.random to rng.random, is almost an order of magnitude slower. We can effectively halve this by using the generator and advancing this generator directly via next, wrapping this next call up in another function adds some overhead, but it is still a lot faster than a naive replacement. So, I am considering adding a method to the model, model.rand, that performs this task. The other option would be to figure out how to subclass numpy.random.Generator and add this rand method to it, because then you can just do self.rng.rand as a much faster alternative for rng.random().

@github-actions
Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔴 +17.0% [+15.3%, +18.7%] 🔴 +50.4% [+50.0%, +50.8%]
BoltzmannWealth large 🔴 +9.1% [+7.3%, +10.5%] 🔴 +60.8% [+55.1%, +65.5%]
Schelling small 🔴 +16.5% [+16.1%, +16.9%] 🔵 +3.6% [+2.9%, +4.3%]
Schelling large 🔴 +10.6% [+9.0%, +12.3%] 🔴 +80.4% [+70.8%, +91.3%]
WolfSheep small 🔴 +6.0% [+5.4%, +6.5%] 🔴 +39.3% [+32.2%, +47.1%]
WolfSheep large 🔴 +6.6% [+5.5%, +7.9%] 🔴 +42.3% [+38.9%, +45.7%]
BoidFlockers small 🔵 +3.2% [+2.7%, +3.7%] 🔵 +1.9% [+1.6%, +2.3%]
BoidFlockers large 🔴 +3.6% [+3.2%, +3.9%] 🔵 +1.9% [+1.4%, +2.4%]

@EwoutH
Copy link
Member

EwoutH commented Nov 21, 2025

More broadly, numpy.random.default_rng() is designed for generating and operating on numpy arrays.

I don't know if it's a good idea, but: Can you pre-generate an array (like a 100 or a 1000), use those numbers, and everytime you used up the array generate a new one? Basically generate random numbers in batch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deprecation When a new deprecation is introduced enhancement Release notes label trigger-benchmarks Special label that triggers the benchmarking CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecating stdlib random

2 participants